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Abstract  (Cont.) 

by  the  binaural  processor,  empirical  and  theoretical  evaluations  of  the  efficiency 
of  psychophysical  procedures,  and  hardware  and  software  developments  to  aid  psycho¬ 
acoustic  research.  Overall,  the  work  examined  issues  and  models  of  contemporary 
interest  and  thus  has  implications  for  auditory  theory  in  general  and  for  the  study 
of  auditory  pattern  analysis  and  auditory  masking  in  specific. 
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I.  RESEARCH  OBJECTIVES 

The  ultimate  goal  of  the  project  is  to  specify  the  transformations  of  the  auditory  stimulus 
used  by  the  subject  to  determine  the  presence  or  absence  of  a  signal  in  an  auditory  masking 
task,  with  particular  emphasis  on  the  role  of  processes  that  compare  information  in  the 
frequency  domain  and  in  the  time  domain,  and  on  the  relation  between  monaural  and 
binaural  effects.  The  study  of  auditory  masking  phenomena  has  far-reaching  implications: 

I )  because,  while  all  of  auditory  theory  is  based  in  whole  or  in  part  on  the  results  of 
masking  studies,  the  mechanisms  underlying  these  phenomena  are  poorly  understood;  2) 
because  the  noise  reduction  strategies  used  by  the  auditory  system  represent  a  basic  form  of 
auditory  pattern  analysis,  which  must  be  addressed  when  modeling  more  complex  auditory 
processing;  3)  because  it  is  often  critical  to  specify  and  to  optimize  human  performance  in 
noisy  environments  or  through  degraded  communication  channels;  and  4)  because  the 
damaged  ear  has  particular  problems  in  noise,  for  which  clinical  solutions  must  be  found. 

Within  our  approach  it  is  assumed  that  the  behavior  of  the  subject  can  be  modeled  by  a 
system  that  on  each  trial  computes  a  single  "decision  variable."  which  in  the  manner 
described  by  the  Theory  of  Signal  Detectability  provides  the  basis  of  the  subject's  decision 
about  the  presence  or  absence  of  the  signal.  Within  this  framework  the  researcher's  task  is 
to  specify  this  decision  variable.  For  the  tone-in-noise  detection  tasks  we  have  been 
investigating,  classical  models  would  argue  that  the  decision  variable  should  correspond 
closely  to  the  stimulus  energy  within  a  narrow  frequency  band  centered  around  the  signal 
(i.e..  the  critical  band)  and  within  a  brief  tempoml  window  that  contains  the  signal.  In  this 
project  traditional  psychophysical  procedures  w  ;re  ombined  with  "new"  techniques 
("Molecular  psychophysics")  in  order  to  demonsL  that  classical  models  of  masking  are 
oversimplifications,  to  develop  models  that  provide  a  more  accurate  description  of  the 
responses  of  the  subject,  and  to  delineate  the  relations  between  the  mechanisms  underlying 
monaural  masking  and  those  underlying  binaural  masking.  A  further  aim  of  this  project  has 
been  to  devleop  and  evaluate  devices,  software,  and  procedures  to  facilitate  psychophysical 
research . 

II.  SUMMARY 

A  number  of  psychophysical  studies  have  been  performed  to  examine  the  phenomena  of 
monaural  and  binaural  masking.  Both  single-channel  and  multichannel  models  have  been  fit 
to  the  data  from  "molecular"  psychophysical  studies.  The  multiple  detector  models  provide 
a  better  fit  to  the  molecular  data  and.  along  with  the  results  of  a  number  of  experiments 
employing  traditional  techniques,  indicate  (hat  subjects  determine  the  presence  of  the  signal 
based  on  a  comparison  of  information  across  different  spectrai/temporal  regions  of  the 
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stimulus.  These  results  are  compatible  with  other  findings  in  the  current  literature,  suggest  a 
more  global  analysis  of  the  stimulus  than  has  typically  been  assumed,  and  provide  a 
quantitative  description  of  these  phenomena.  Other  significant  results  include  a  more 
complete  description  of  internal  noise  processes,  evidence  that  the  external  masker  is  not 
cancelled  by  the  binaural  processor,  evidence  that  similar  models  can  be  used  to  predict 
monaural  and  binaural  masking  data,  and  evidence  that  remote  masking  and  suppression  may 
be  related  phenomena.  In  addition,  adaptive  staircase  techniques  have  been  examined,  a 
software  noise  generator  has  been  implemented,  and  a  device  to  control  the  presentation  of 
high  quality  sound  in  auditory  experiments  has  been  designed  and  implemented. 

III.  STATUS  OF  THE  RESEARCH 

Our  research  on  monaural  masking  and  on  binaural  masking  proceeds  in  parallel,  with 
considerable  interdependence.  Additional  support  for  this  research  has  been  provided  by 
grants  from  NSF  (BNS-85-1  1768)  "Analysis  of  models  of  auditory  masking,"  period  of 
support  July  15,  1985  through  July  14.  1988.  R.H.  Gilkey.  PI.  and  (BNS-87-20305) 
"Analysis  of  models  of  auditory  masking."  period  of  support  January  I,  1988  through  June 
30.  1989.  R.H.  Gilkey.  PI. 

Profile  analysis  in  noise 

In  an  experiment  on  diotic  masking  we  investigated  the  effect  of  randomizing  the  overall 
level  of  the  stimulus  on  the  detectability  of  a  relatively  brief  tonal  signal,  as  a  function  of  the 
duration  and  the  bandwidth  of  the  masking  stimulus.  If  the  subject  bases  his  decision  on  the 
energy  in  a  single  critical  band  and  a  single  temporal  integration  window,  it  should  be 
possible  to  disrupt  his  performance  by  randomizing  the  overall  level  of  the  stimulus  (thus 
randomizing  the  energy  in  that  critical  band  and  temporal  integration  window).  The 
approach  is  based  on  the  profile  analysis  experiments  of  Green  [Am.  Psychol.  38:  133-142, 

1 983 1 .  Here,  however,  the  background  is  random  noise  rather  than  a  tone  complex.  When 
the  masker  is  narrowband  and  short  in  duration,  there  is  a  substantial  effect  of  randomizing 
overall  level.  However,  when  the  masker  is  wideband  and  long  in  duration,  the  effect  of 
randomizing  overall  level  is  negligible.  Apparently,  some  information  present  in  the  spectral 
fringe  (that  portion  of  the  masker  that  does  not  overlap  with  the  signal  in  the  frequency 
domain)  and  temporal  fringe  (that  portion  of  the  masker  that  does  not  overlap  with  the 
signal  in  the  temporal  domain)  can  be  used  to  overcome  the  effects  of  randomizing  overall 
level.  When  the  bandwidth  and  the  duration  of  the  masker  are  manipulated  separately,  the 
results  suggest  that  the  information  in  the  frequency  domain  is  most  important.  These 
results  call  into  question  the  classical  critical  band  interpretation  of  auditory  masking.  Green 
|op.  cit.J  suggests  that  subjects  in  his  profile  analysis  task  base  their  decisions  on  an  analysis 
of  spectral  shape.  Our  results  suggest  that  a  similar  type  of  analysis  may  be  occurring  in 
tone-in-noise  masking.  Said  differently,  by  comparing  information  in  different 
spectral/temporal  regions,  the  subject's  decision  variable  can  be  scaled  to  be  independent  or 
nearly  independent  of  overall  level.  The  results  were  described  in  talks  presented  at  CID 
and  at  the  meeting  of  the  AFOSR  program  in  Auditory  Pattern  Recognition,  and  are 
included  in  Gilkey  ("Spectral  and  temporal  comparisons  in  auditory  masking."  in  W.A.  Yost 
and  C.S.  Watson  (eds.).  Auditory  Processing  of  Complex  Sounds.  26-36.  1 987 1 . 
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An  experiment  analogous  to  that  of  Gilkey  [op.  cit.|  was  conducted  using  the  dichotic 
NOSn  stimulus  configuration.  Again,  the  effect  of  randomizing  overall  level  was  much 
smaller  under  the  wideband  long-duration  condition  than  under  the  narrowband  short- 
duration  condition.  Again,  when  the  effects  of  duration  and  bandwidth  were  investigated 
separately,  the  results  suggest  that  the  information  in  the  spectral  fringe  is  most  important. 
The  presence  of  an  effect  of  randomizing  overall  level  under  the  narrowband  short-duration 
conditions  is  difficult  to  explain  by  simple  models  that  are  based  solely  on  interaural 
differences.  That  is,  even  though  the  overall  level  is  randomized,  the  interaural  differences 
are  not.  Thus,  in  order  to  explain  these  data  a  model  would,  at  a  minimum,  have  to 
postulate  the  presence  of  multiplicative  internal  noise.  On  the  other  hand,  the  fact  that  the 
effect  of  randomizing  level  is  eliminated  when  the  bandwidth  of  the  masker  is  wide  suggests 
that  the  classical  critical  band  interpretation  of  auditory  masking  is  inadequate  and  that  some 
process  compares  information  in  different  spectral/temporal  regions.  These  results  were 
presented  at  the  meeting  of  the  AFOSR  program  in  Auditory  Pattern  Recognition. 

Binaural  temporal  masking 

If  the  interaural  phase  of  a  noise  masker  is  switched  during  the  observation  interval  from 
in  phase  (NO)  to  180°  out  of  phase  (Nn)  or  from  Nn  to  NO,  a  brief  interaurally  out-of-phase 
signal  (Src)  will  be  about  15  dB  more  detectable  in  the  NO  portion  of  the  noise  than  in  the 
Nn  portion.  By  investigating  the  change  in  detectability  as  a  function  of  the  delay  (At) 
between  the  onset  of  the  signal  and  the  phase  transition  in  the  noise,  the  temporal  response 
of  the  binaural  system  can  be  evaluated.  The  results  of  this  case  can  be  contrasted  with  a  set 
of  conditions  in  which  the  interaural  phase  of  the  noise  is  held  constant  (Nn).  but  the  level 
of  the  noise  is  reduced  or  increased  by  15  dB  halfway  through  the  observation  interval. 
Within  a  model  such  as  the  E-C  model  |N.l.  Durlach.  "Binaural  signal  detection: 
Equalization  and  cancellation  theory"  in  J.V.  Tobias  (ed.).  Foundations  of  Modern  Auditory 
Theory  II.  371-462.  1 972 ]  the  first  case  produces  a  change  of  level  only  in  the  binaural 
channel.  The  second  case  produces  a  change  in  the  level  in  the  monaural  channel  as  well. 
The  curves  that  describe  the  relation  between  threshold  and  At  can  be  thought  of  as  temporal 
masking  functions.  They  show,  like  traditional  temporal  masking  data,  that  the  decay  of 
backward  masking  (cases  where  the  NO  segment  of  the  noise  precedes  an  Nu  segment  or 
where  the  lower  intensity  segment  of  the  noise  precedes  the  higher  intensity  segment)  is 
more  rapid  than  for  forward  masking.  Double-sided  exponential  integration  windows  were 
fit  to  the  forward  and  backward  masking  functions.  The  equivalent  rectangular  duration  of 
the  best-fitting  window  under  monaural  conditions  ranged  from  12-26  ms.  somewhat  larger 
than  those  estimated  by  Moore  et  al.  |J.  Acoust.  Soc.  Am.  83:  I  102-1  I  16.  1 988 [.  The 
equivalent  rectangular  durations  for  the  binaural  conditions  ranged  from  41-83  ms.  in  the 
range  estimated  by  Grantham  and  Wightman  |J.  Acoust.  Soc.  Am.  65:  1509-1517.  1 9791 . 
The  observed  differences  between  monaural  and  binaural  conditions  were  taken  as  additional 
evidence  that  the  binaural  system  responds  sluggishly  to  changing  stimulation  [Grantham  and 
Wightman.  J.  Acoust.  Soc.  Am.  63:  51  1-523.  1978).  A  paper  has  been  submitted 
[Kollmeier  and  Gilkey.  1 989 1 . 
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Effects  of  forward  masker  fringe 

In  studying  the  effects  of  a  forward  masker  fringe.  Yost  |J.  Acoust.  Soc.  Am.  78: 
901-907,  1985 1  found  that  the  threshold  for  a  brief  Sn  signal  masked  by  a  brief  NO  masking 
noise  was  not  changed  when  an  Nn  forward  masker  fringe  was  added.  This  result  was 
somewhat  surprising  in  light  of  results  such  as  those  of  McFadden  [J.  Acoust.  Soc.  Am.  40: 
1414-1419,  1 966 1  who  showed  that  an  NO  forward  fringe  substantially  improved 
performance  in  an  NOSn  detection  task,  and  concluded  that  the  system  uses  the  forward 
fringe  as  a  diotic  reference  against  which  to  detect  the  dichotic  signal.  If  an  NO  forward 
fringe  provides  a  useful  reference,  it  might  be  expected  that  an  Nit  forward  fringe  would 
provide  a  detrimental  reference.  Yost's  results  also  seemed  to  conflict  with  the 
interpretations  of  Kollmeier  and  Gilkey  [op.  cit.j.  who  thought  of  the  Nn  fringe  as  a  forward 
masker.  They  found  a  gradual  decrease  in  the  amount  of  masking  as  a  function  of  At  (the 
delay  between  the  onset  of  the  signal  and  the  phase  transition  in  the  noise),  with  a 
substantial  effect  of  the  Nn  segment  of  the  noise  even  when  the  signal  was  presented  well 
within  the  NO  segment  of  the  noise.  One  possibility  was  that  the  function  that  relates 
threshold  to  At  for  the  Nn  forward  fringe  condition  intersects  with  the  function  that  relates 
threshold  to  At  for  the  pulsed  masker  condition  at  At  =  0.  even  though  the  functions  are 
different  elsewhere.  To  resolve  these  questions,  the  detectability  of  an  Sn  tonal  signal  was 
investigated  as  a  function  of  At.  in  the  presence  of  an  NO  "masker"  that  was  preceded  by 
quiet,  or  by  an  Nn  "forward  fringe"  and  followed  by  quiet  or  by  an  NO  or  Nn  "backward 
fringe."  The  results  show  that  the  two  functions  were  indeed  different  and  that  they  did  not 
intersect.  Overall,  the  results  failed  to  replicate  those  of  Yost,  showing  instead  that  the 
presence  of  the  Nn  forward  fringe  reduced  detectability  for  all  subjects  under  a  wide  variety 
of  conditions.  The  results  are  a  further  indication  that  the  auditory  system  uses  information 
that  does  not  overlap  with  the  signal  in  the  temporal  domain.  These  data  were  presented  to 
the  MLD  Society  and  to  the  Acoustical  Society  of  America  [Simpson  and  Gilkey.  J.  Acoust. 
Soc.  Am.  82:  S  108(A).  1 98 7 1 .  A  manuscript  has  been  submitted  [Gilkey.  Simpson,  and 
Weisenberger.  1 988 1. 

The  interrelation  between  remote  masking  and  suppression 

We  are  partially  replicating  the  experiments  of  Wegel  and  Lane  [Physiol.  Rev.  23: 
226-285.  1 924 1  on  remote  masking  and  of  Duifhuis  [J.  Acoust.  Soc.  Am.  67:  914-927. 

1 980 1  on  suppression,  using  modern  adaptive  psychophysical  techniques:  the  same  subjects 
participate  in  both  experiments.  In  the  remote  masking  task,  the  masker  is  a  602-Hz 
sinusoid  and  the  signal  is  a  simultaneous  I500-Hz  sinusoid.  In  the  suppression  experiment 
the  suppressor  is  a  602-Hz  sinusoid,  the  masker  is  a  I500-Hz  sinusoid,  and  the  signal  is  a 
brief  I500-Hz  sinusoid  immediately  following  the  masker.  The  results  replicate  those  of  the 
early  studies.  A  preliminary  analysis  suggests  that  the  nonlinear  growth  of  remote  masking 
can  be  seen  as  having  two  components,  one  suppressive  and  one  excitatory.  All  of  the 
results  appear  to  be  compatible  with  the  prediction  of  the  MBPNL  model  [Goldstein. 
"Updating  cochlear  driven  models  of  auditory  perception:  A  model  for  nonlinear  auditory 
frequency  analyzing  filters."  in  H.  Bouma  and  B.  Elsendorn  (eds.).  Working  models  of 
human  perception.  19-57.  1988;  "Modeling  two  factor  cochlear  responses."  presented  at 
"Basic  research  in  a  clinical  environment."  July  5-7.  Dedham.  MA.  1 989 [ . 
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Molecular  psychophysical  analyses  of  models  of  masking 

Monaural  studies.  In  most  studies  of  auditory  masking,  including  those  described 
above,  both  the  stimulus  and  the  performance  of  the  subjects  are  described  by  their  statistical 
properties  (e.g.,  the  average  power  in  the  stimulus  and  the  average  probability  of  a  correct 
response).  The  outputs  of  models  are  described  by  their  distributional  properties  and  the 
average  performance  of  a  model  is  fit  to  the  average  performance  of  a  subject.  Another 
approach  was  described  by  Green  |Psychoi.  Rev.  7_h  392-407.  1964)  and  referred  to  as 
"molecular"  psychophysics.  In  this  approach,  reproducible  noise  is  used  as  a  masker,  such 
that  the  stimulus  can  be  specified  exactly  on  every  trial.  Similarly,  the  responses  of  the 
subject  are  considered  on  a  trial-by-trial  basis.  The  outputs  of  models  are  determined  for 
each  stimulus  and  the  fit  of  the  model  is  evaluated  by  comparing  these  outputs  to  the 
associated  responses  of  the  subjects. 

Gilkey  and  Robinson  |J.  Acoust.  Soc.  Am.  79.  1499-1510,  1 986 1  used  computer  models 
to  predict  subjects'  responses  to  the  individual  noise  samples  of  Gilkey,  Robinson,  and 
Hanna  |J.  Acoust.  Soc.  Am.  T8'.  1207-1219,  1985).  The  parameters  of  the  models  were 
manipulated  until  the  outputs  were  best  able  to  predict  the  subjects'  responses.  The 
combination  of  a  50-Hz-wide  filter,  followed  by  a  half-wave  rectifier  and  an  integrator  with  a 
100-200-ms  decay  constant,  predicted  their  responses  relatively  well.  However,  a  model  that 
formed  a  weighted  combination  of  the  outputs  of  several  detectors,  each  of  which  processed 
information  in  a  different  spectral  region,  yielded  even  better  predictions  and  suggested  that 
subjects  compare  the  spectrum  near  the  signal  frequency  to  other  areas  of  the  spectrum.  A 
similar  model  that  processed  information  over  different  temporal  intervals  suggests  that 
subjects  also  compare  the  waveform  during  the  signal  interval  to  the  waveform  immediately 
before  the  onset  of  the  signal. 

Gilkey  and  Robinson  |op.  cit.J  investigated  a  fairly  small  set  of  reproducible  noise 
samples  (25  noise-alone  and  100  signal-plus-noise  samples).  We  have  recently  replicated 
and  extended  their  finding  using  a  larger  set  of  reproducible  noise  samples  (150  noise-alone 
and  150  signal-plus-noise  samples).  Again,  we  began  with  a  simple  model  composed  of  a 
single-tuned  filter  centered  at  the  signal  frequency,  followed  by  a  half-wave  rectifier  and  an 
integrator  with  a  leak.  We  sampled  the  output  of  the  integrator  at  the  end  of  the  signal 
interval  as  the  decision  variable  of  the  model.  Our  procedures  had  greater  precision  than 
those  of  Gilkey  and  Robinson  and  yield  best-fitting  bandwidths  that  were  somewhat  smaller 
than  theirs  (in  the  range  of  26-49  Hz  across  subjects)  and  best-fitting  decay  constants  that 
were  somewhat  shorter  (in  the  range  of  39-100  ms  across  subjects).  We  also  investigated 
alternate  or  additional  cues  that  the  auditory  system  might  be  using  to  determine  the 
presence  of  the  signal.  Specifically,  we  have  considered  cues  related  to  the  regularity  of  the 
envelope  and  the  regularity  of  the  fine  structure  of  the  waveform  at  the  output  of  the 
model's  initial  filter.  Preliminary  results,  however,  suggest  that  these  cues  will  not  add 
greatly  to  the  proportion  of  predicted  variance.  We  also  considered  multi -channel  models 
that  combine  the  output  of  several  detectors  that  process  information  in  different  spectral 
regions.  The  obtained  spectral  weighting  functions,  which  describe  how  the  model  weights 
information  across  frequency,  were  quite  similar  to  those  of  Gilkey  and  Robinson  and 
suggest  that  subjects  compare  information  in  different  spectral  regions.  A  significant 
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advancement  has  been  the  description  of  these  weighting  functions  with  a  relatively  simple 
equation,  which  can  be  interpreted  as  the  difference  between  "excitatory"  and  "inhibitory" 
Gaussian-shaped  weighting  functions.  We  also  examined  models  that  combine  the  output  of 
a  single  filter  over  several  brief  temporal  windows.  The  obtained  temporal  weighting 
functions  were  quite  similar  to  those  of  Gilkey  and  Robinson,  and  suggest  that  subjects 
compare  information  immediately  before  the  signal  onset  to  information  during  the  signal 
interval.  These  results  strongly  question  classical  critical  band  theory.  Even  though  on 
average  the  subject’s  performance  is  unaffected  by  information  that  is  outside  a  single  critical 
band,  when  the  responses  to  individual  reproducible  noise  samples  are  investigated,  it  can  be 
seen  that  the  responses  are  dependent  on  the  pattern  of  spectral  information  across  a 
frequency  range  that  is  much  wider  than  a  single  critical  band.  That  is.  subjects  are  more 
likely  to  report  the  presence  of  the  signal  if  the  stimulus  spectrum  is  "peaked"  near  the 
signal  frequency,  independent  of  the  overall  height  of  the  spectrum.  A  talk  describing  some 
of  these  results  was  presented  to  the  Acoustical  Society  of  America  (Gilkey  and  Meyer,  J. 
Acoust.  Soc.  Am.  82:  S92(A).  1987]  and  at  the  meeting  of  the  AFOSR  program  in  Auditory 
Pattern  Recognition.  A  manuscript  is  in  preparation  [Meyer  and  Gilkey.  1989). 

If  the  spectral  weighting  functions  derived  by  Gilkey  and  Robinson  |op.  cit.|  and  Gilkey 
and  Meyer  |op.  cit.|  provide  a  correct  view  of  monaural  processing,  then  it  should  be 
possible  to  add  and  subtract  energy  from  different  regions  of  the  spectrum  and  change  the 
obseived  value  of  P(y).  10  NA  and  10  SN  waveforms  were  selected  from  those  investigated 
by  Gilkey  and  Meyer.  The  energy  in  each  of  the  waveforms  was  raised  or  lowered  by  6  dB 
in  each  of  seven  47-Hz-wide  bands,  centered  from  one  octave  below  the  signal  frequency  to 
one  octave  above  the  sienal  frequency,  yielding  280  "modulated"  waveforms.  A  different 
waveform  was  presented  on  each  trial  of  a  block,  and  a  substantial  portion  of  these 
wavefoims  were  unmodulated.  Although  changes  in  P(y)  with  decrements  are  fairly  small  in 
general,  the  pattern  of  results  is  that  which  would  be  expected  based  on  our  previous 
weighting  functions.  The  pattern  for  increments  is  somewhat  unexpected.  Although  the 
predicted  "inhibitory"  effects  aie  observed  at  low  frequencies,  the  "excitniorv"  effect  is 
broader  than  anticipated,  and  little  evidence  of  inhibitory  effects  is  observed  at  high 
frequencies.  This  apparent  conflict  can  potentially  be  explained  by  assuming  that  the  subject 
looks  for  a  peak  in  the  spectrum  anywhere  near  the  500-Hz  signal  frequency.  Under  normal 
circumstances  a  peak,  it  present,  would  probably  be  at  the  signal  frequency.  However, 
under  the  modulated  noise  conditions  peaks  can  occur  over  a  wide  range.  This  strategy 
would  be  equivalent  to  monitoring  the  output  of  several  weighting-function  models  like  those 
of  Gilkey  and  Meyer,  each  tuned  to  different  frequency  regions,  and  choosing  the  largest 
output  to  use  as  a  decision  variable.  We  are  presently  investigating  this  possibility.  The 
results  were  presented  to  the  Acoustical  Society  of  America  (Gilkey  et  al..  J.  Acoust.  Soc. 
Am.  84:  S  140(A).  1 988b | .  A  manuscript  is  planned. 

Binaural  studies.  We  have  also  been  using  the  molecular  psychophysical  approach  to 
examine  models  of  binaural  hearing.  The  method  is  comparable  to  that  in  the  monaural 
experiments  except  that  the  signal  is  presented  t80°n,,t  0f  phase  interaurally  ..bile  the  noise 
remains  diolic.  Initially,  we  have  been  examining  a  relatively  small  set  of  sample"  (25 
noise-alone  and  50  signal-plus-noise  samples).  Computing  the  output  of  a  binaural  model 
lor  a  specific  noise  sample  is  not  as  straightforward  as  with  a  monaural  model,  because  in 
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most  binaural  models  the  internal  noise  must  be  added  at  an  earlier  stage  and  cannot  be 
thought  of  as  random  variation  at  the  input  to  the  decision  stage.  We  have  implemented  a 
computer  model  similar  to  the  E-C  model  |Durlach.  op.  cit.|.  The  model  consists  of  two 
initial  band-pass  filters,  one  for  each  ear.  The  filtered  waveform  in  the  left  channel  is 
subject  to  random  delay  and  multiplied  by  a  random  gain  factor  (i.e..  "the  internal  noise") 
The  waveforms  in  each  channel  are  subjected  to  fixed  delays  (equalization),  and  then  the  two 
channels  are  subtracted  (cancellation).  The  differenced  waveforms  are  then  half-wave 
rectified  and  integrated  to  obtain  the  value  of  the  model's  decision  variable.  Because  it  is 
impractical  to  use  Monte  Carlo  methods,  the  effect  of  the  internal  noise  is  incorporated  bv 
computing  the  decision  variable  for  each  of  100  equally  probable  combinations  of  random 
time  delay  and  random  gain  factor  from  normal  distributions  of  time  delay  and  gain.  One 
approach  has  been  to  investigate  the  relationship  between  the  internal  noise  parameters  of 
the  model  (i.e..  the  random  time  delay  and  gain  factor)  and  the  ratio  of  external  to  internal 
noise  standard  deviations  (R)  as  described  by  Gilkey.  Hanna,  and  Robinson  |J.  Acoust.  Soc. 
Am.  69:  S23(A),  1981 J  and  others.  Surprisingly,  the  value  of  R  for  the  model  is  not  a 
monotonic  function  of  the  magnitude  of  the  model's  internal  noise  parameters.  Further, 
there  are  rather  limited  combinations  of  internal  noise  parameters  that  yield  values  of  R 
comparable  to  those  found  with  human  observers. 

A  second  approach  has  been  to  compute  the  average  value  of  the  model's  decision 
variable  for  each  waveform.  This  average  value  is  used  to  predict  the  response  of  the  subject 
on  a  waveform -by-waveform  basis.  The  bandwidth  of  the  initial  filters  of  model,  the 
standard  deviation  of  a  normal  distribution  of  random  delays,  and  the  standard  deviation  of  a 
normal  distribution  of  random  gains  are  varied  in  order  to  produce  the  best  fit.  Bandwidlhs 
fall  in  the  range  of  21  to  51  Hz.  similar  to  the  range  observed  for  the  N0S0  case.  Standard 
deviations  of  delay  distribution  are  between  I  19  and  200  and  correspond  roughly  to  the 
values  estimated  by  Colburn  and  Durlach  ("Models  of  binaural  interaction."  in  E.C. 
Carterette  and  M.P.  Friedman  (eds.).  Handbook  of  Perception.  467-518.  1978).  For  two  of 
three  subjects,  the  values  of  the  gain  factor  correspond  relatively  well  to  the  estimates  of 
Colburn  and  Durlach  |op.  cit. ] .  For  the  third  subject,  however,  the  gain  factor  is  near  zero, 
a  somewhat  anomalous  result.  With  these  parameter  values,  between  53  and  60%  of  the 
variance  in  the  data  of  the  subjects  can  be  predicted,  comparable  to  the  single-channel  model 
for  the  diotic  case. 

Next,  a  linear  combination  of  the  average  output  of  the  seven  E-C  elements  was  formed, 
each  tuned  to  a  different  frequency  regions  from  350  to  650  Hz.  The  bandwidth  of  the 
initial  filter,  the  standard  deviation  of  the  delay  distribution,  and  the  standard  deviation  of 
the  gain  distribution  in  each  channel  were  set  to  the  best-fitting  values  estimated  for  the 
single-channel  model.  We  derived  weighting  functions  described  by  the  difference  between 
excitatory  and  inhibitory  Gaussian  weighting  functions.  The  resulting  spectral  weighting 
functions  are  quite  similar  in  form  to  those  for  the  diotic  case  and  also  produce  a  comparable 
increase  in  the  proportion  of  predicted  variance.  A  linear  combination  of  the  output  of  a 
single  E-C  element  over  seven  different  2  I -ms  subintervals  of  the  signal  interval  also 
increases  the  proportion  of  predicted  variance.  However,  the  shapes  of  these  temporal 
weighting  functions  are  more  inconsistent  across  subjects  than  was  the  case  for  the  diotic 
condition  and  do  not  have  an  easily  interpretable  form.  These  molecular  studies  of  binaural 
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masking  were  presented  at  the  meeting  of  the  AFOSR  Program  on  Auditory  Pattern 
Recognition  and  a  manuscript  is  planned. 

Estimates  of  internal  noise  as  a  function  of  signal  frequency.  As  mentioned  above,  the 
internal  noise  parameters  of  a  binaural  model  are  of  critical  importance.  The  relation 
between  R  (the  ratio  of  internal  to  external  noise  standard  deviations)  and  signal  frequency  is 
being  investigated  under  both  N0S0  and  NOSri  conditions.  The  E-C  model  |Durlach.  op. 
cit .  |  suggests  that  the  influence  of  (he  random  time  delay  should  increase  with  signal 
frequency.  Our  estimates  at  1500  Hz  show  values  of  R  that  are  in  general  slightly  lower 
than  those  at  500  Hz.  Interpretation  of  these  results  is  complicated  by  the  fact,  noted 
earlier,  that  the  relation  between  the  internal  noise  parameters  of  the  E-C  model  and  R  is  not 
simple. 

Overall,  these  "molecular"  results  indicate  that  classical  critical  band  theory  provides  an 
oversimplified  view  of  processing  in  auditory  masking  tasks.  The  weighting  functions 
provide  a  quantitative  description  of  the  way  the  system  compares  information  across 
frequency  and  across  time.  Even  though  diotic  and  dichotic  masking  have  typically  been 
assumed  to  be  governed  by  quite  different  mechanisms,  similar  models  can  be  used  to 
predict  the  responses  of  the  subjects  in  both  cases,  yielding  similar  results. 

Adaptive  staircase  techniques  in  psychoacoustics:  A  comparison  of  human  data  and  a 
mathematical  model 

We  compared  two  common  adaptive  staircase  rules,  the  "one  up-two  down"  rule  and  the 
"one  up-three  down"  rule  | Levitt .  J.  Acoust.  Soc.  Am.  49:  467-477.  1971 1  in  combination 
with  a  two-alternative  forced-choice  procedure  and  with  a  three-alternative  forced-choice 
procedure.  The  adaptive  staircase  tracks  were  modeled  as  Markov  chains.  The  model 
predicts  that  threshold  estimates  obtained  with  the  adaptive  techniques  should  be  equal  to 
those  derived  with  equivalent  "fixed  signal  level"  techniques.  However,  the  human  data 
indicate  that  the  adaptive  techniques  tend  to  yield  lower  thresholds.  The  model  predicts  that 
the  standard  error  of  a  threshold  estimate  obtained  from  an  adaptive  technique  will  decrease 
and  approach  zero  as  the  number  of  trials  used  to  compute  the  estimate  increases.  The 
human  data  show  greater  variability  than  predicted  and  approach  a  nonzero  value  as  the 
number  of  trials  increases.  The  predictions  of  the  model  suggest  that  the  commonly  used 
combination  of  the  2AEC  procedure  and  the  "I  up  2  down"  rule  is  the  least  efficient  method 
of  estimating  a  threshold  and  that  the  5  AFC  procedure  in  combination  with  the  "I  up  3 
down"  rule  is  the  most  efficient  method.  The  human  data  are  less  consistent,  but  generally 
show  the  combination  of  the  2 AFC  procedure  and  the  "  I  up  2  down"  rule  to  be  one  of  the 
least  efficient  methods.  A  manuscript  has  been  published  (Kollmeier.  Gilkey.  and  Sieben.  J. 
Acoust.  Soc.  Am.  83:  1852-1862.  I9S8|. 

Laboratory  development 

As  described  in  our  original  proposal,  a  major  part  of  our  effort  during  the  first  period 
of  the  grant  (July  15.  1986  to  March  15.  1988)  was  spent  upgrading  and  developing  our 
computer  facilities  for  laboratory  control  and  data  analysis.  In  October  1986  we  installed  a 
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new  multiuser  MicroVAX  il  for  data  analysis,  graphics,  and  signal  processing  to  support  our 
monaural  and  binaural  modeling  efforts.  In  January  1987  we  replaced  the  aging  Nova  4x 
computer  system,  which  had  been  used  for  experiment  control,  but  had  become  increasingly 
unreliable.  An  existent  SMS  1000  PDPI  1/73  computer  was  combined  with  newly  developed 
hardware  and  software  for  stimulus  presentation,  response  collection,  timing,  and  device 
control.  Programs  were  written  to  control  specific  experiments  on  auditory  and  tactile 
perception,  allowing  rapid  access  to  large  sets  of  waveforms,  and  the  ability  to  present  four 
independent  waveforms  with  16-bit  accuracy  at  sample  rates  up  to  40  kHz. 

The  MicroVAX  II  has  13  megabytes  of  main  memory,  a  7 1 -megabyte  (formatted) 
Winchester  disk  and  a  second  300-megabyte  (formatted)  Winchester  disk,  a  95-megabyte 
streaming  tape  drive,  and  nine  serial  ports.  The  following  devices  are  connected  to  the 
system:  an  LA2I0  draft  printer,  two  Hewlett-Packard  HP7475A  pen  plotters,  a  Courier 
2400-baud  modem,  and  terminals,  including  VT330.  VT240  and  HP2623A  graphics 
terminals,  and  VT220  display  terminals.  One  port  is  reserved  for  communication  with  the 
PDPI  1/73  computer  that  is  used  for  experiment  control.  Principal  applications  of  the 
MicroVAX  II  are  program  development,  data  analysis,  graphics,  and  modeling.  Fitting 
algorithms,  signal  processing  subroutines,  graphics  software,  and  modeling  programs  have 
been  implemented,  allowing  us  to  analyze  our  experimental  results.  Ethernet  hardware  and 
software  installed  on  the  MicroVAX  II  allow  high-speed  communication  between  the 
MicroVAX  II  and  other  computers  of  the  Research  Department,  including  the  Speech 
Perception  Laboratory  MicroVAX  H  that  allows  access  to  ILS  signal-processing  software, 
and  a  Research  Department  MicroVAX  II  system  that  allows  access  to  word  processing  and 
spreadsheet  software,  from  terminals  in  the  Signal  Detection  Laboratory.  For  word 
processing  we  have  purchased  an  LN03-plus  laser  printer.  Also  available  on  the  network  are 
four  8-port  terminal  servers.  In  addition,  a  microwave  link  between  CID  and  Washington 
University  permits  communication  between  the  Signal  Detection  Laboratory  and  the 
Washington  University  network  of  computers  on  the  main  campus  and  medical  school 
campus.  Presently,  there  are  over  10°  computers  on  the  network,  providing  access  to  a 
variety  of  applications.  These  include  library  search  and  catalog  facilities,  national  and 
international  mail  services  such  as  BITNET  and  ARPANET,  and  signal-processing  and 
statistical  packages. 

The  Scientific  Micro  Systems  SMS- 1000  Model  40  consists  of  a  PDPI  1/73  processor.  4 
megabytes  of  main  memory,  an  85-megabyte  Winchester  disk.  1.2  megabyte  floppy  disk 
drive,  six  serial  ports,  and  a  real-time  clock.  Peripherals  include  a  VT220  system  console 
terminal  and  an  LA  1 00  draft  printer.  Two  Micro  Technology  Unlimited  Digisound  16-bit 
digital -to-analog  and  analog-to-digilal  conversion  subsystems  provide  a  total  of  four  channels 
of  D/A  and  four  channels  of  A/D.  allowing  the  presentation  of  independent  waxeforms  to  a 
maximum  of  four  listeners.  A  parallel  I/O  interface  permits  communication  with  subject 
response  boxes  and  control  of  programmable  attenuators  and  electronic  switches.  The  SMS- 
1000  is  used  for  real-time  experiment  control  and  data  acquisition  for  the  auditory  and 
tactile  experiments  conducted  in  the  Signal  Detection  Laboratory. 
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Direct  memory  access  control  of  the  Micro  Technology  Unlimited  DigiSound-16  with  a  Q- 
bus  based  computer 

High-quality  digital  generation  and  recording  of  sounds  is  essential  for  many  of  our 
experiments.  However,  few  high  quality  (16-bit)  digital-to-analog  and  analog-to-digital 
subsystems  were  available  for  the  PDPI 1/73  (Q-bus-based)  computer,  and  most  of  these 
were  expensive.  After  examining  available  options,  we  decided  on  the  Micro  Technology 
Unlimited  (MTU)  Digisound-16.  However,  in  order  to  use  this  system  we  had  to  overcome 
four  problems.  First,  the  two  "stereo"  channels  within  a  single  Digisound  are  strobed  with  a 
10  /£  interchannel  delay,  producing  a  detectable  interaural  difference.  Second,  while  the 
Digisound-16  has  an  8-bit  data  path,  the  PDPI  1/73  has  a  16-bit  data  path.  Third,  the 
Digisound-16  requires  the  data  for  the  two  stereo  channels  to  be  interleaved  in  its  own  buffer 
memory,  while  it  is  typically  most  convenient  to  store  the  waveforms  in  separate  arrays  in 
computer  memory.  Fourth,  no  specific  mechanism  was  available  to  allow  direct  memory 
access  (DMA)  control  of  the  Digisound  16.  To  overcome  these  problems  we  designed  an 
interface  that  would  allow  as  many  as  two  DRV  I  l-WA  DMA  interfaces  to  be  connected  to  as 
many  as  two  Digisound- 1 6s.  This  interface  design  is  now  in  use  on  three  computer  systems 
here  at  CID  and  benefits  a  number  of  other  projects,  including  AFOSR  grant  #86-0335 
"Auditory-acoustic  basis  of  consonant  perception."  J.D.  Miller,  PI.  MTU  has  used  this 
design  to  produce  a  commercially-available  device,  which  we  hope  will  be  of  benefit  to  other 
auditory  scientists.  A  manuscript  describing  the  design  of  this  interface  is  in  preparation 
|  Gilkey  and  Partridge.  1989}. 

Software  generation  of  reproducible  noise 


A  software  shift-register  noise  generator  has  been  implemented  to  generate  reproducible 
noise  for  our  experiments.  Given  three  input  values,  this  program  will  generate  an  arbitrary 
length  (up  to  5.2  days)  reproducible  two-state  binary  noise,  which,  when  filtered,  is 
approximately  white  and  Gaussian.  A  paper  describing  this  program  and  some  of  the 
properties  of  the  noise  it  produces  has  been  published  | Gilkey.  Robinson,  and  Frank,  J. 
Acoust.  Soc.  Am.  83:  829-831.  1 988 J . 
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